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ABSTRACT 

Beyond the linear regime of structure formation, part of cosmological information 
encoded in galaxy clustering becomes inaccessible to the usual power spectrum. Suf¬ 
ficient statistics, A*, were introduced recently to recapture the lost, and ultimately 
extract all, cosmological information. We present analytical approximations for the 
A* and traditional power spectra as well as for their covariance matrices in order to 
calculate analytically their cosmological information content in the context of Fisher 
information theory. Our approach allows the precise quantitative comparison of the 
techniques with each other and to the total information in the data, and provides 
insights into sufficient statistics. In particular, we find that while the A* power spec¬ 
trum has a similar shape to the usual galaxy power spectrum, its amplitude is strongly 
modulated by small scale statistics. This effect is mostly responsible for the ability of 
the A* power spectrum to recapture the information lost for the usual power spec¬ 
trum. We use our framework to forecast the best achievable cosmological constraints 
for projected surveys as a function of their galaxy density, and compare the infor¬ 
mation content of the two power spectra. We find that sufficient statistics extract all 
cosmological information, resulting in an approximately factor of ~ 2 gain for dense 
projected surveys at low redshift. This increase in the effective volume of projected 
surveys is consistent with previous numerical calculations. 

Key words: cosmology: large-scale-structure of the Universe, methods : analytical, 
methods, cosmology : cosmological parameters 


1 INTRODUCTION 

Within the current inflationary paradigm of cosmology, the 
small initial density fluctuations are believed to be very close 
to Gaussian statistics. The most natural observables of such 
a field, the two-point statistics, lose some of their statis¬ 
tical power as non-linear gravitational growth induces cor¬ 
relations between Fourier modes (Rimes fc Hamilton||2005 


Neyrinck et al. 20061. These correlations, especially those 
between large and small scales, diminish the amount of in¬ 
formation accessible to these two-point statistics. A fraction 
of this hidden information is accessible to higher-order N- 
point statistics (e.g., Peebles|1980 Szapudi|2009 1. They are, 
however, not only difficult to measure and interpret due to a 
combinatorial explosion of complexity, but they fail to cap¬ 
ture all available cosmological information, increasingly so 
on more non-linear scales. ( Carron fc Neyrinck|2012 Carron 
fc Szapudi|2013 and references therein). 

Non-linear transformations, such as the logarithmic 
mapping (Neyrinck et al. 20091 or variants thereof (Seo 


et al.|20lT Joachimi fc Taylor|2011 1 were introduced specif¬ 
ically to retrieve the total information content of the matter 


field. Carron & Szapudi (20131 defined sufficient statistics as 


an observable extracting all cosmological information from 
data. They have demonstrated in the context of perturba¬ 
tion theory and A-body simulations that the logarithmic 
transformation, A = ln(l -|- S), approximates well the exact 
sufficient statistics of the dark matter field. Note that in the 
case of a continuous lognormal field A is the exact sufficient 
statistics, a statement supported by analytical calculations 
and measurements in simulations ( Carron et al.|[^14a and 
references therein). 

In a previous work, Carron & Szapudi (20141 introduced 


the local non-linear transformation A* as the optimal ob¬ 
servable to extract the information content of galaxy count 
maps. This recaptures in its spectrum the total available 
cosmological information in presence of shot-noise. The new 
observable has been characterized in detail using numeri¬ 


cal simulations of 2-dimensional survey configurations (Wolk 
et al.|20l4 Carron et al.|2014a|. Yet, the precise manner in 
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which A* recaptures the cosmological information remained 
somewhat of a puzzle, given that it’s shape closely resem- 
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bles that of the power spectrum. In this work, we present an 
analytical theory of the total information content (i.e. the 
constraining power) of the A* angular power spectrum for 
cosmological parameters, assuming that we have access to 
the power spectrum and its derivatives as a function of cos¬ 
mological parameters; this is provided by a standard Boltz- 
man code, such as CAMB ( Lewis et al.|2000 |. Our approach 
is then used to compare sufficient statistics with the usual 
angular galaxy power spectrum as a function of the relevant 
projected survey characteristics, most importantly the shot 
noise level. The analytical approach provides insight into the 
workings of sufficient statistics, in particular it sheds light 
on the crucial role played by the bias of the non-linear trans¬ 
formation in recapturing the lost information. 

To test its validity, we carefully compare our model to the 
predictions from our previous numerical simulations. For 
simplicity of expression, we will designate these numeri¬ 
cal results as “exact” throughout this work. In practice, 
these simulations provide accurate enough results that this 
nomenclature is justihed. Throughout this paper, the nota¬ 
tion P{k) is used to designate the angular power spectrum 
with fc~^-|-l/2in the flat sky approximation. 

Sectionj^describes the analytical ansatz for the A* bias 
and covariance matrix. Described in Section |3] is the model 
for the 2-dimensional matter field covariance matrix. With 
this model, we estimate the information content for different 
survey densities as presented in Section Our estimations 
are also compared to previous numerical predictions of |Wolk| 
et al. (20141 . We summarize and conclude with a discussion 
in Section 1^ 


2 ANSATZ FOR THE A* BIAS AND 
COVARIANCE MATRIX 

Assuming that the galaxy counts Poisson sample an under¬ 
lying lognormal galaxy field, let N = {Ni, • • • , be a 

map of galaxy counts. In the following, Uceiis = 128^, for a 
two dimensional map. Given a sampling rate N, the map¬ 
ping from N to A* is defined by the non-linear equation 
( Carron fc Szapudi|2014 1: 


A* I T\T 2 A* 2 

A -\- Na,e — a. 




( 1 ) 


where u* = ln(l -|- crf^), with ag^ the variance of the galaxy 
held huctuations at the cell scale. These A*-mapping pa¬ 
rameters are estimated using our hducial cosmology and are 
then kept hxed. Given the current precision of cosmological 
parameters this assumption amounts to no practical limita¬ 
tions for our technique. 

I Wolk et al.| ( |2014 |, using a simulation pipeline cali¬ 
brated on the Canada-France-Hawaii Telescope Legacy Sur¬ 
vey (CFHTLSQdata, have shown that the information gain 
using the mean and spectrum of A* instead of the galaxy 
power spectrum on the three cosmological parameters 
(78 and Wo is up to about a factor of 2, especially at low 
redshifts and for dense surveys. This numerical approach 
clearly demonstrated that the “sufficient statistics” A* per¬ 
forms better, yet, it could not yield qualitative insights into 


workings of sufficient statistics. Given that the shape of the 
A* power spectrum is very similar to the usual power spec¬ 
trum the question naturally arises: is the increase of infor¬ 
mation attributed to A* itself, more precisely its derivatives 
being more sensitive to parameters, or, to the fact that the 
corresponding covariance matrix is better behaved, in par¬ 
ticular more diagonal? In our analytical approach next we 
point out the crucial cosmological dependence of the bias, 
and show that part of the information gain in fact can be 
pinned on the derivatives of the bias with respect to cosmo¬ 
logical parameters. 


2.1 From the galaxy power spectrum to the 
A-power spectrum 

Our analytical approach assumes prior knowledge on the 
galaxy power spectrum Pg^ = P as function of cosmological 
(and halo) parameters. We use the standard, unweighted 
power spectrum estimator: 


P(fc) = 


1 


VNk 




( 2 ) 


with V the survey volume and where the sum runs over the 
Nk Fourier modes associated to the fc-th power spectrum 
bin. This simple estimator is optimal for simple geometries, 
such as A-body simulations. Including complications from 
survey geometry and the corresponding optimal weighting 
of the estimator will not change any of our results, as the 
scales we are focussing on are small enough that edge effects 
will become unimportant. We model the galaxy clustering 
with the Ha l o Occ upation Distribution (HOD) description of 
Wolk et al. (20141 and the CosmoPM(0 package. We consider 
four different redshift bins: 0.2 < z < 0.4, 0.4 < z < 0.6, 
0.6 < < 0.8 and 0.8 < a < 1.0. Figure shows the A*- 

and galaxy power spectra for the redshift bin 0.6 < a < 
0.8 with their la conhdence regions (shaded) as well as the 
predictions of the power spectra for the underlying fields Sg 
and A = ln(l -|- 5g) (blue dotted lines). 

The first step is to model the bias between the spectra 
of the two continuous fields Sg and A = ln(l -b 5g). Here 
we assume a lognormal underlying galaxy density, an hy¬ 
pothesis that have been proved to be very accurate in 2D 
( Carron et al.|2014a |. Then the simplest Ansatz to consider 
is the ratio of the variances, and for the lognormal model 
the variances are related as a\ = ln(l -|- aj^). Explicitly, we 
assume 


where: 


Pa = PaP 


O-A 


( 3 ) 


( 4 ) 


- 1 

According to the left panel of Figure]^ this approxima¬ 
tion is better than 4% accurate on all the fe-range, even if 
it starts to deviate slightly both for very large or very small 
scales. This for mula, in the regime o f low A-variance, re¬ 


duces to that of 
from simulations for the 3-dimensional power spectrum. 


Neyrinck et al. (2009 \ b\ = e obtained 
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Figure 1. Predictions of both the galaxy angular power spectrum 
Pg{k) and the non-linear transform A* power spectrum Pa^O) 
in the redshift bin 0.6 < z < 0.8. The grey area show the Icr con¬ 
fidence regions. The blue dotted lines represent the power spectra 
of the underlying fields Sg and A = ln(l + 5) and thus illustrate 
the effect of shot-noise. 


2.2 From continuous to discrete fields 

The link between the continuous and discrete galaxy field 
power spectra is well understood for Poisson sampling 
through Pg = P + 1/n where n is the density of the consid¬ 
ered survey related to the sampling rate via n = NucMs/V. 
V is the survey volume and is fixed here to the size of the 
CFHTLS-Wl field L = 7.46 degrees on the side. As it will 
be explained in more details in Section considering the 
local lognormal case will result in a cancellation of the con¬ 
tribution from the super survey modes in the galaxy power 
spectrum covariance matrix, hence implying that the infor¬ 
mation content does not depend on the survey geometry. 
Then, how could the relationship be explained between the 
A- and the A* power spectra? 

From Equation it can expected that this relationship 
depends on the j4*-mapping parameters and especially on 
N as, at a particular redshift and riceiis, cr* is fixed. Fig¬ 
ure 1^ shows the scatter plot of A* as a function of A for 
two different values of N, the first one corresponding to the 
sampling rate of the CFHTLS-Wl field in the redshift bin 
0.6 < 2 < 0.8. 

When the survey is dense, i.e when N is large enough, 
there is almost no bias between A and A* meaning that the 
local transformation A* traces well the underlying field. In 
contrast, for a low density survey. A* tends to be smaller 
than A leading to less fluctuations thus less power in A* 
in agreement with the simulations on Figure Hence most 
of the bias is due to the fact that, for low N, A* cannot 
distinguish between low A regions or a cell that happens to 
be empty due to a low N (a non-local generalization of A* 
would potentially behave better [Carron &: Szapudl||2013[ ). 

The next step is to relate the galaxy power spectrum 
to the A* power spectrum. In order to take the shot-noise 


contribution into account, we develop to the 2""^ order ex¬ 
pansion around zero for the exponential term of Equation 
and then take the Fourier transform: 
r n 2 


4 + 


Pa- =Pn = N^Pg = 4 + ^] • 


( 5 ) 


Thus the bias has a simple form: 



( 6 ) 


where 

The right panel of Figurej^shows the A*- and galaxy power 
spectra as well as the prediction from Equation]^ The ac¬ 
curacy is better than 0.5% over the whole fc-range. 


2.3 The A* covariance matrix 


To quantify its Fisher information content, we need an esti¬ 
mation of the A*-covariance matrix: 

Covf/ = {Pa- ih)PA- (kj)) - {Pa- iki)){PA- {kj)) (8) 

We found previously that a diagonal Gaussian covariance 
provides an accurate model: 


Covf; = ^p^,{ki)PA-ikg)S,. 




( 9 ) 


where the A*-power spectrum is given b y Equation!^ This 
is further motivated by the fact that i) Carron & Szapudi 
( |2013[ ) have shown non-linear transformations tend to Gaus- 
sianize the field, ii) in our model of lognormal underlying 
distribution, it would be exact in the absence of shot-noise 
(i.e when N goes to infinity) and iii) taking shot-noise into 
account tends to increase the diagonal part of the covariance 
matrix adding an extra term (1/n)^. 

The left panel of Figure [5 shows the diagonal of the 
matrix obtained using Equation^ as a function of the exact 
value. The agreement is almost perfect on the diagonal be¬ 
tween the two quantities. The middle panel represents the 
comparison between the approximate (lower right) and the 
exact (upper left) values of the normalised A*-covariance 
matrix. The analytical formula reproduces the exact predic¬ 
tion at the 10% level or better. To quantify the impact of 
these discrepancies on the non-diagonal terms, we can con¬ 
sider the squared cumulative signal-to-noise for A* defined 


{S/Nf = ^ 

kj^,kj ^krr 


PA-ih)[CoYf{0-^PA-ikg) 


( 10 ) 


as a function of the resolution kmax- This is shown on the 
right panel of Figure [4] at our resolution kmax ~ 3000, the 
accuracy of Equation]^ is better than 5%. 


3 THE 2D GALAXY FIELD COVARIANCE 
MATRIX 

As the galaxy power spectrum is among the most widely 
used statistic to extract information about cosmological 
parameters in large scale structures surveys, it is worth 
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A A 

Figure 2. Scatter plot of A* as a function of A for two different values of N using riceiis = 128^. The red lines represents the cell values 
for which A* = A. When the survey is dense, i.e when N is large (right panel), there is almost no bias between A and A* . However, for 
low density survey (left panel), A* tends to be smaller than A leading to a bias between the two quantities. 




Figure 3. On the left panel, the solid black lines represent the prediction of power spectra of the underlying fields <5 and A = ln(l + 5) 
in the redshift bin 0.6 < z < 0.8. The dashed red line is the result obtained using Equation]^ On the right panel, predictions of both the 
galaxy angular power spectrum Pg{k) and the non-linear transform A* power spectrum Pj\*{k) in the redshift bin 0.6 < 2 < 0.8. The 
dashed red line is the result obtained using Equation]^ On both panels, the grey area show the Icr confidence regions and the inside 
panels show the deviation from the true value using our model. 


quantifying how much information one can expect on a 
given parameter as a function of the survey characteristics 
and compare it to the total information on this parameter 
available from the data set. 


Carron et al. (2014b I developed a useful, approximate 


form of the matter power spectrum in the mildly non-linear 
regime based on previous studies from A’-body simulations 
( Neyrinckj2011[ Mohammed &: Seljak||2014 1: 


Cov,,. = {p{h)p{k,)) - {P{h)}{Pikj)} 




+ ^minP{ki)P{kj). 


( 11 ) 


The first term corresponds to the Gaussian covariance and 
the second term approximates the shell-averaged trispec¬ 
trum of the field. It turns out that the parameter (Tmini can 
be interpreted as the minimum variance achievable on an 
amplitude-like parameter (see Carron et al. 2014b for de¬ 
tails). It can be further decomposed into two contributions: 


= at;s + ^is- 


( 12 ) 


The first term is due to the correlation between large wave¬ 
length “super-survey” modes with the small scales while 
the second term corresponds to the coupling between small 
scales or “intra-survey” modes. 

Here we study local density fluctuations, 5 = de- 
Hned with respect to the local observed density. In the par- 
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Figure 4. On the left panel, the diagonal of the A*-power spectrum covariance matrix obtained using Equation]^ as a function of 
the exact value obtained with simulations. The middle panel represents the comparison between the approximate (lower right) and the 
exact (upper left) values of the normalised A*-covariance matrix. At < 10% level, the analytical formula reproduces very well the exact 
prediction. The right panel shows the squared cumulative signal-to-noise for A* obtained using approximation of Equation compared 
to the exact value. At our resolution, the accuracy is better than 5%. 


ticular case of a lognormal underlying distribution, there 
is a cancellation between two contrib utions in the covari- 
ance matrix resulting in afg = 0 (see Carton et al.|[^14b 


for details). Thus in our study, the only significant con¬ 
tribution comes from the “intra-survey” modes. We model 


<^min within the hierarchical Ansatz (Peebles 
Bernardeau||1996 1, reducing to (see details in Carton et al. 
2014b|): 


1980 


Fry 


1984 


= a]s = ORa + m,)^^ f ^kP\k) 


V 


(13) 


(4-Ra + 4J?i,) 


which decreases as the resolution increases. 

Although it has been proved to be a good model in the 
3-dimensional case, this approximation does not work par¬ 
ticularly well in our case mostly for the fact that i) there 
are projection effects as we consider 2-dimensional cluster¬ 
ing, ii) we probe here more non-linear scales {kmax ~ 3000). 
In fact this form of the covariance matrix is known to work 
until < 0.8 hMpc“^ while in our case k^^x ~ ^ hMpc“^ 
for 2 = 0.7. Thus we propose a generalization introducing 
a scale dependent amin = o-min{k). Then Ansatz for the 
covariance matrix becomes: 


2 (P(fci) -f i)^ 

CoVjj = 6ij “h (Jrnin Iki'jU min Ikj Pi^ki'^ P(^kj ) . 

Ai,. 

(14) 

In order to estimate of (Tmin(fc), we proceed as follows. 
The leading term of the trispectrum of the lognormal held 
is given by (see e.g. Takahashi et al.||2014 1 : 


r(k„ -k., k„ -k,) = 2P(kOP(k,)(P(kO + P(k,)) 
+ (P(k0 + p(k,))2[P(|k. -f k,|) + P(|k, - k,|)]. 


To obtain the spectrum covariance matrix, we need to av¬ 
erage this expression, summing over all Fourier modes ki 
and kj in the corresponding bins of shells of the spectrum 
estimator. Assuming the bin width is small enough, this av¬ 
eraging affects only the term in square brackets. In the limit 
of a large number of modes and inhnitesimal bin width, it is 
the average with respect to the angle 9 between ki and kj. 


Table 1. Best-fitting slope n values and x^/d.o.f derived using 
the fitting formula P{k) = on the 2-dimensional field pre¬ 

dictions in the four redshift bins. 


Redshift bin 

n 

xVd.o.i 

0.2 < 2 < 0.4 

-1.34 

0.04 

0.4 < 2 < 0.6 

-1.38 

0.07 

0.6 < 2 < 0.8 

-1.44 

0.21 

0.8 < 2 < 1.0 

-1.63 

0.21 


On the diagonal {ki = kj = k) it takes the form 


1 


' (^\/2 + 2 cos 


dO. 


(16) 


It appears that for the relevant case of pure power law spec¬ 
tra P oc AskA with realistic exponent, this integral diverges. 
We can correct this by separating the (in the real world fi¬ 
nite and negligible) contribution of the background mode 
fluctuation P(0) from the rest, which we still treat in the 
continuous limit. I.e. we write 


1 



As{2k^{l A cos(9)))"''^dB. 


(17) 


The cutoff parameter e = arctan(l/ni), where rii = 
kiL/{2 ti) is such that the integral starts at the hrst non¬ 
zero mode of the discrete Fourier modes associated to the 
grid. The integration gives: 

^n + l 

— P(fc)B_ 2 (,/ 2 )(l/ 2 , (n + l)/2) (18) 

where B is the incomplete beta-function. According to Equa¬ 
tion |15| we have on the diagonal of the covariance of the 
galaxy field matrix: 


^hinih) = TiilP{kif - 


(19) 


To estimate the slope n, we adjust a power-law to the 2- 
dimensional power spectrum and use the best fit values pre¬ 
sented in Table [T] 
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Figure 5. On the left panel, the comparison between the predicted value of using the exact matter covariance matrix predicted 

by simulations compared to Equation |19| (red dashed line). As a comparison the dot-dashed line shows the value given by Equation |13| 
from |Carron et al.| | |2014b| |. Our approximation reproduces well the shape of the true value within 30% at high-fc and 10% at low-fc. The 
middle panel shows the comparison between the exact (upper left corner) and the approximate (lower right corner) values of the matter 
covariance matrix. The right panel shows the squared cumulative signals-to-noise obtained using approximation of Equation |14| for the 
galaxy power spectrum covariance matrix compared to the exact value. At our resolution, the accuracy is within ~ 10%. 


The left panel of Figure [m shows the comparison be¬ 
tween the predicted value of using the exact matter 

covariance matrix. The red dashed line shows given 

by Equation and as a comparison the dot-dashed line 
shows the value given by Equation |13| Our approximation 
reproduces well the shape of the true value within 30% at 
high-fc and 10% at low-fc. The middle panel of Figure [^illus¬ 
trates the comparison between the exact and the approxi¬ 
mate covariance matrix while the right panel shows that the 
squared cumulative signals-to-noise agree within ~ 10%. 


4 INFORMATION CONTENT 

We now have all the ingredients to quantify the Fisher 
information content of the A*-power spectrum for cosmo¬ 
logical parameters (which in the shot-noise free regime 
is very close to the total information). We can also com¬ 
pare the cosmological information content of the galaxy 
power spectrum to that of A*. Our analytical model only 
requires on the prediction of the galaxy power spectrum 
P and of the shot-noise level of the considered survey 
through N. Thus, for a given observation, we can forecast 
analytically the constraints on cosmological parameters 
extracted from the clustering of the underlying random field. 


Given a set of parameters a,/3,..., the Fisher matrix 
of the spectrum is defined as: 


FaP 


E 

ki ,kj <km, 


dPjki) 

da 


[CoVij] ^ 


dP{k,) 

dp 


( 20 ) 


where the covariance matrix is given by Equation |14[ The 
inverse of the Fisher matrix corresponds to the covariance 
of the posterior distribution of the parameters that could be 
obtained given the error bars one has on the data. It means 
that the larger the value of a Fisher matrix coefficient is, the 
smaller the variance becomes, and therefore, the tighter the 
constraint on the parameter. 


The information content from A* is given by: 
A* dPA-{ki) A*i-1 


Fap - E 

ki ,kj<ik. 


dPA^ih) idPA^ik,) 

da ^ dp ■ 


Thus Equation leads to 

cov5- = |l(p(fc) + l)(p(fc,) + f)%. 

Equation |21| then becomes: 


nA* 1 a In Pa* ,, 5 In Pa* 


k<kjj 


with 


a In Pa* (A:) a In 6^ ainP 9 (fc) 

da da da 

_d\nh\ dP(k) 1 

da aa P{k) + l/n 


( 21 ) 


( 22 ) 


(23) 


(24) 


The bias coming from the A* mapping is fixed by the 
fiducial values of HOD and cosmology and thus does not 
carry a cosmological dependence. 


Finally: 


where: 


with 


P;-rS + ^^(s/iv)l 


dlnb\ a 


da 


-P/3,In Aj + 


da dp 
dlnb'^ 


dp 


— P*^! A 


F^p = j d\nkw{k) 


ainP(fc) ainP(fc) 


^ ^ V k f nP(k) y 

2^( hP(fc) + J • 


(25) 

(26) 

(27) 


corresponding to the usual formula from Tegmark (19971. 
We have replaced the discrete sums with integrals using the 
fact that the number of modes Nk is approximately the sur¬ 
face of the shell used for the bin averaging divided by the 
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<N> 


Figure 6. Fisher information of the A*-(red lines) and galaxy power spectrum (black lines) a function of the shot-noise level (through 
the sampling rate TV) for the two cosmological parameters as (left panel) and w (right panel) in the four redshift bins 0.2 < 2 < 0.4, 
0.4 < 2 < 0.6, 0.6 < 2 < 0.8 and 0.8 < 2 < 1.0. For TV >5 — 7, we see that the non-linear transform A* performs better than the galaxy 
power spectrum over the whole range of number densities and redshifts to extract information on cosmological parameters. It can also 
be seen that the A*-power spectrum is more powerful at low redshifts where the non-linearities are stronger, but also for dense survey 
(i.e for large values of TV). 


distance element between two discrete modes. With our con¬ 
vention: 


Nk~V 


2'Rkdk 


(28) 


Moreover, in Equation |25[ by analogy to |Carron et al.| 
( 2014b| ), we have introduce a nonlinear amplitude parame¬ 
ter In defined such as dinA^P{k) ~ P{k)- This parameter 
corresponds to the initial amplitude (j| in the linear regime 
and at 2 = 0. We further define the Gaussian signal to noise 
as: 


{S/N)% = j dink 


V k^ 


(29) 


corresponding to a case without shot-noise. The derivatives 
are estimated numerically using the CosmoPMC package. 

The panels of Figure show the A*- and galaxy power 
spectrum Fisher information as a function of N for the two 
cosmological parameters ag (left panel) and w (right panel) 
in the four redshift bins 0.2 < z < 0.4, 0.4 < 2 < 0.6, 
0.6 < 2 < 0.8 and 0.8 < 2 < 1.0. We consider values of 
N > 5 where A* is expected to start to perform better. 
We can clearly see that for small N, the shot-noise erases 
information present in the underlying random field. As pre¬ 


viously seen in Wolk et al. (20141, the analytical approach 


developed in this work reproduces well the general trends 
expected for the non linear transform A*: i) for N > 5 — 7, 
A* performs better than the galaxy power spectrum over the 
whole range of number densities and redshifts and thus could 
be used to unveil the otherwise hidden information, ii) the 
use of the A*-power spectrum to extract the information is 
more powerful at low redshifts where the non-linearities are 
stronger, and iii) our observable is more efficient for dense 
survey (i.e for large values of N). 


To compare in a quantitative way our results with the 
previous forecasts of | Wolk et al. ( 2014[ ), Figure]^ shows the 
predicted improvement in the information on wo (left panel) 
and as (right panel) as a function of redshift and the survey 
shot-noise level. The quantity plotted is the ratio between 
the galaxy and j4*-power spectrum Fisher matrix elements. 
In that sense, it represents the expected information gain 
using the non-linear transform A* instead of the power spec¬ 
trum. The simplest interpretation of this gain is an effective 
gain in survey area. We also show for illustrative purposes, 
the values of N for different upcoming surveys in the first 
redshift bin where the gain is known to be the highest (see 
Wolk et al.|2014 for details). 

This analytical approach reproduces better than 20% 
the expected gain for the two parameters ns and wo- Qual¬ 
itatively, the achievable gain is about a factor of 2, espe¬ 
cially at low redshifts and for dense surveys. We conclude 
that the analytical model developed here using the matter 
power spectrum at a redshift 2 and the number density of 
the survey, is able to predict the constraints on cosmological 
parameters from galaxy clustering with reasonable precision. 


5 DISCUSSION 

It has been known that non-linear transforms help to cap¬ 
ture more efficiently the information encoded in the matter 


density field. The notion of sujficient statistics (Carron & 


Szapudi 20131 has emerged as the optimal transformation 


that extracts all cosmological information. In the case of a 
discrete galaxy field the new observable A* was constructed 
I Carron fc Szapudi|2014 1. Wolk et al. (20141 have forecasted 
using a numerical approach the expected improvement on 
constraints beyond that of using the galaxy power spectrum 
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<N> 


<N> 


Figure 7. Information gains on the cosmological parameters wg (left panel) and trg (right panel) using the A* power spectrum instead 
of the galaxy power spectrum as a function of the shot-noise level in the survey (through the sampling rate N). We recover the gain 
predicted numerically in |Wolk et al.| l [2014[ | (black crosses) which was about a factor of 2, especially at low redshifts and for dense surveys. 
Illustrated by the vertical lines, the values of TV for different upcoming surveys in the first redshift bin 0.2 < z < 0.4. 


on the latest CFHTLS data set as well as on upcoming 
large wide-held surveys; for the former, the forecast agreed 
well with the actual gain realized when calculating the 
A* power spectrum. In this work, we have developed an 
analytical approach that captures the statistics of A* to the 
point that we could accurately forecast the best achievable 
constraints on cosmological parameters as a function of the 
survey density. The forecast improvement is consistent with 
previous, more tedious, numerical calculations at the 20% 
level at worst (or 10% for error bars). 

We have presented an Ansatz for the bias between 
the galaxy and A*-power spectra, and demonstrated its 
accuracy compared to the previous numerical approach. 
We showed that the dependence of the bias on cosmology 
is crucial for endowing A* with the ability to recapture the 
hidden information from the field. In addition, we proposed 
a diagonal form for the A* power spectrum covariance 
matrix and showed that it is accurate at the 5% level. 


In order to compare with the standard method of ex¬ 
tracting cosmological parameters from the galaxy power 
spectrum, we have provided and explored the accuracy of 
an analytical Anstatz for the projected power spectrum 
covariance matrix. Based on a generalization of | Carron et al.| 
1 2014b I, we were able to reproduce squared cumulative 
signals-to-noise of the matter field within 10%. 


Although our analytical framework contains a fair number 
of approximations, we have demonstrated that our forecasts 
are reliable at least within 20% even at the most non-linear 
scales we probed. Moreover, our method includes all the 
non-Gaussian effects (super survey modes, trispectrum, 
discreteness) and thus it is expected to be more accurate 
than the standard Gaussian forecasts entirely ignoring such 
effects. In addition, the approach has also provided new 


insights and a deeper understanding of the cosmological 
information content of the galaxy clustering. 


Finally, we predicted the best achievable constraints 
on the cosmological parameters: ug and wo as a function 
of the shot-noise level in the survey. We were able to 


recover, the predictions from Wolk et al. (20141 using a 


large ensemble of numerical simulations, and found that 
the gain on the information using the A*-power spectrum 
translates into factor of 2 gain approximately, especially at 
low redshifts and for dense surveys. 

The promise of A* for improving cosmological constraints 
from future surveys has been clear for a while. However, 
until now, the prediction of its power spectrum involved 
a large number of numerical simulations, a disadvantage 
when used in an MCMC sampling framework to fit cosmo¬ 
logical parameters. Likewise, the corresponding covariance 
matrices also needed massive number of simulations. The 
present work provides a convenient and accurate short cut, 
that can be used at least for forecasting purposes, and it has 
the potential of speeding up MCMC sampling as well. The 
present approximations have been tested for 2-dimensional 
projected surveys, but similar developments can be carried 
out for 3-dimensional surveys as well. Previous attempts 
have been made using dark matter simulations, however, 
it is worth mentioning that the information gain is volume 
dependent and changes with respect to a local or global 
description (i.e if we consider density fluctuations defined 
with respect to the local observed density or not, see 
Carron et al.|[^14b |. Neyrinck et al. (20091 using the local 
density field from the 500 h~^Mpc Millenium simulation 
( Springel||2005 1 found a factor of ~ 10 improvement on the 
information on the (5/TV)^ (corresponding approximately 
to ln(cr|) unmarginalized over the other cosmological 
parameters) using sufficient statistics. [Neyrinck (20111 






































On the total information 9 


doing the same analysis on the Coyote Universe (Heitmann 


et al. 2009 20101 which have a box size of 1300 Mpc, 
found then an improvement of ~ 15. More recently, |Wolk| 


et al. (20151 considering the contraints on neutrino mass 
from the DEMNUNI simulation of volume U = 8 h“®Gpc® 
using both the power spectrum and sufficient statistics 
found a factor ~ 8 improvement on the information. A 
similar framework to the present for 3-dimensional surveys 
including the effects of redshift space distortions both on 
the power spectra and on covariance matrices would be 
desirable for applications, and are left for future work. 
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